Interval-based Queries over Multiple Streams with Missing Timestamps
نویسندگان
چکیده
Recognising patterns that correlate multiple events over time becomes increasingly important in applications from urban transportation to surveillance monitoring. In many realworld scenarios, however, timestamps of events may be erroneously recorded and events may be dropped from a stream due to network failures or load shedding policies. In this work, we present SimpMatch, a novel simplex-based algorithm for probabilistic evaluation of event queries using constraints over event orderings in a stream. Our approach avoids learning probability distributions for time-points or occurrence intervals. Instead, we employ the abstraction of segmented intervals and compute the probability of a sequence of such segments using the principle of order statistics. The algorithm runs in linear time to the number of missed timestamps, and shows high accuracy, yielding exact results if event generation is based on a Poisson process and providing a good approximation otherwise. As we demonstrate empirically, SimpMatch enables efficient and effective reasoning over event streams, outperforming state-ofthe-art methods for probabilistic evaluation of event queries by up to two orders of magnitude.
منابع مشابه
ارائه روشی پویا جهت پاسخ به پرسوجوهای پیوسته تجمّعی اقتضایی
Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...
متن کاملPoint-Versus Interval-Based Temporal Data Models
The association of timestamps with various data items such as tuples or attribute values is fundamental to the management of time-varying information. Using intervals in timestamps, as do most data models, leaves a data model with a variety of choices for giving a meaning to timestamps. Specifically, some such data models claim to be point-based while other data models claim to be interval-base...
متن کاملLoad Shedding for Temporal Queries over Data Streams
Enhancing continuous queries over data streams with temporal functions and predicates enriches the expressive power of those queries. While traditional continuous queries retrieve only the values of attributes, temporal continuous queries retrieve the valid time intervals of those values as well. Correctly evaluating such queries requires the coalescing of adjacent timestamps for value-equivale...
متن کاملQuery Languages and Data Models for Database Sequences and Data Streams
We study the fundamental limitations of relational algebra (RA) and SQL in supporting sequence and stream queries, and present effective query language and data model enrichments to deal with them. We begin by observing the well-known limitations of SQL in application domains which are important for data streams, such as sequence queries and data mining. Then we present a formal proof that, for...
متن کاملAnswering queries over incomplete data stream histories
Streams of data often originate from many distributed sources. A distributed stream processing system publishes such streams of data and enables queries over the streams. This allows users to retrieve and relate data from the distributed streams without needing to know where they are located. Stream data is important not only for its current values but also for past values produced. In order to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017